Goto

Collaborating Authors

 similarity judgment


e8ddc03b001d4c4b44b29bc1167e7fdd-Paper-Conference.pdf

Neural Information Processing Systems

They live in the same physical world and are intimately familiar with the materials that comprise it, but they would have significant difficulty expressing their values and generalizing the results of an experiment they observetogether. The alchemist would likely learn poorly from examples of a reaction demonstrated by the chemist, not having the right inductive biases for the waytheworldactuallyworks.


Supplementary material for " Improving neural network representations using human similarity judgments " Anonymous Author(s) Affiliation Address email A Experimental details 1 A.1 Model features 2

Neural Information Processing Systems

Figure A.1: Among all hyperparameter combinations considered in our grid search, a combination of ( We used a compute time of approximately 5600 CPU-hours of 2.90GHz Intel Xeon Gold In this section, we outline our anomaly detection experimental setting in more detail. Given a dataset (e.g., CIFAR-10) with In contrast to the "one-vs-rest" setting, in LOO we define one class of the In both "one-vs-rest" and LOO AD settings, we evaluate model representations in the following way: We show the pairs of items that change the most in distance in Table B.1. "stethoscope", which are semantically unrelated but perhaps have some slight visual similarity, tend We show the results in Fig. B.1. Table B.1: Distances between pairs of individual items from THINGS, ranked by the relative change in cosine The top items move much closer together under naive alignment, while the bottom ones move much farther apart. Figure B.1: How does the global structure of the representations change after alignment?



Learning Human-like Representations to Enable Learning Human Values Andrea H. Wynn

Neural Information Processing Systems

How can we build AI systems that can learn any set of individual human values both quickly and safely, avoiding causing harm or violating societal standards for acceptable behavior during the learning process? We explore the effects of representational alignment between humans and AI agents on learning human values.




Supplementary material for " Improving neural network representations using human similarity judgments " Anonymous Author(s) Affiliation Address email A Experimental details 1 A.1 Model features 2

Neural Information Processing Systems

Figure A.1: Among all hyperparameter combinations considered in our grid search, a combination of ( We used a compute time of approximately 5600 CPU-hours of 2.90GHz Intel Xeon Gold In this section, we outline our anomaly detection experimental setting in more detail. Given a dataset (e.g., CIFAR-10) with In contrast to the "one-vs-rest" setting, in LOO we define one class of the In both "one-vs-rest" and LOO AD settings, we evaluate model representations in the following way: We show the pairs of items that change the most in distance in Table B.1. "stethoscope", which are semantically unrelated but perhaps have some slight visual similarity, tend We show the results in Fig. B.1. Table B.1: Distances between pairs of individual items from THINGS, ranked by the relative change in cosine The top items move much closer together under naive alignment, while the bottom ones move much farther apart. Figure B.1: How does the global structure of the representations change after alignment?



Uncovering the Computational Ingredients of Human-Like Representations in LLMs

Studdiford, Zach, Rogers, Timothy T., Mukherjee, Kushin, Suresh, Siddharth

arXiv.org Artificial Intelligence

The ability to translate diverse patterns of inputs into structured patterns of behavior has been thought to rest on both humans' and machines' ability to learn robust representations of relevant concepts. The rapid advancement of transformer-based large language models (LLMs) has led to a diversity of computational ingredients -- architectures, fine tuning methods, and training datasets among others -- but it remains unclear which of these ingredients are most crucial for building models that develop human-like representations. Further, most current LLM benchmarks are not suited to measuring representational alignment between humans and models, making benchmark scores unreliable for assessing if current LLMs are making progress towards becoming useful cognitive models. We address these limitations by first evaluating a set of over 70 models that widely vary in their computational ingredients on a triplet similarity task, a method well established in the cognitive sciences for measuring human conceptual representations, using concepts from the THINGS database. Comparing human and model representations, we find that models that undergo instruction-finetuning and which have larger dimensionality of attention heads are among the most human aligned, while multimodal pretraining and parameter size have limited bearing on alignment. Correlations between alignment scores and scores on existing benchmarks reveal that while some benchmarks (e.g., MMLU) are better suited than others (e.g., MUSR) for capturing representational alignment, no existing benchmark is capable of fully accounting for the variance of alignment scores, demonstrating their insufficiency in capturing human-AI alignment. Taken together, our findings help highlight the computational ingredients most essential for advancing LLMs towards models of human conceptual representation and address a key benchmarking gap in LLM evaluation.


Bridging the behavior-neural gap: A multimodal AI reveals the brain's geometry of emotion more accurately than human self-reports

Du, Changde, Lu, Yizhuo, Huang, Zhongyu, Sun, Yi, Zhou, Zisen, Qin, Shaozheng, He, Huiguang

arXiv.org Artificial Intelligence

The ability to represent emotion plays a significant role in human cognition and social interaction, yet the high-dimensional geometry of this affective space and its neural underpinnings remain debated. A key challenge, the `behavior-neural gap,' is the limited ability of human self-reports to predict brain activity. Here we test the hypothesis that this gap arises from the constraints of traditional rating scales and that large-scale similarity judgments can more faithfully capture the brain's affective geometry. Using AI models as `cognitive agents,' we collected millions of triplet odd-one-out judgments from a multimodal large language model (MLLM) and a language-only model (LLM) in response to 2,180 emotionally evocative videos. We found that the emergent 30-dimensional embeddings from these models are highly interpretable and organize emotion primarily along categorical lines, yet in a blended fashion that incorporates dimensional properties. Most remarkably, the MLLM's representation predicted neural activity in human emotion-processing networks with the highest accuracy, outperforming not only the LLM but also, counterintuitively, representations derived directly from human behavioral ratings. This result supports our primary hypothesis and suggests that sensory grounding--learning from rich visual data--is critical for developing a truly neurally-aligned conceptual framework for emotion. Our findings provide compelling evidence that MLLMs can autonomously develop rich, neurally-aligned affective representations, offering a powerful paradigm to bridge the gap between subjective experience and its neural substrates. Project page: https://reedonepeck.github.io/ai-emotion.github.io/.